AITopics

2607.00224

Country: North America > United States > Illinois > Cook County > Chicago (0.40)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)

Chandramoorthy, Nisha, Sanz-Alonso, Daniel, Waniorek, Nathan

From Spectral Methods to Sample Complexity Bounds for Fourier Neural Operators

arXiv.org Machine LearningJul-2-2026

We establish approximation and learning guarantees for Fourier neural operators (FNOs) applied to time-$T$ solution operators of dissipative evolution equations. The analysis builds on the premise that FNOs can efficiently approximate and learn solution operators whenever these operators admit stable and accurate spectral discretizations. To formalize this idea, we introduce classes of evolution operators defined through spectral methods and derive FNO approximation bounds and polynomial sample complexity guarantees for these classes. For equations with polynomial nonlinearities, the learning rates depend primarily on the smoothness of the input space and the dimension of the physical domain. Our results hold uniformly over broad families of dissipative equations, rather than for a single fixed PDE, and apply in particular to the Navier--Stokes, Allen--Cahn, and Cahn--Hilliard equations. For equations with non-polynomial smooth nonlinearities, we prove that polynomial sample complexity still holds with rates that now additionally depend on the smoothness of the nonlinear terms and the dissipation strength. Overall, we connect classical spectral approximation theory with modern operator learning and explain when FNOs can learn nonlinear evolution operators efficiently.

artificial intelligence, machine learning, operator, (18 more...)

2607.0032

Country: North America > United States > Illinois > Cook County > Chicago (0.40)

Genre: Research Report (0.64)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Martinez-Sermeno, Flor, Jaramillo, Arturo, Van Horebeek, Johan

Adjusted Wasserstein distances for bridging empirical and true distributions with applications to MDS

arXiv.org Machine LearningJun-30-2026

This paper examines how metric adjustments to Multidimensional Scaling (MDS) can enhance its effectiveness as a visual tool for pattern recognition. The distance under consideration, referred to as Max-D-SW, is an adjustment of the Max-Sliced Wasserstein distance. In contrast to the original formulation, which optimizes over single unit directions, Max-D-SW aggregates contributions over orthonormal bases. This modification provides a clear numerical advantage in MDS outcomes, particularly when applied to heavy-tailed distributions. We also establish sample-complexity bounds showing that Max-D-SW remains statistically tractable, with rates comparable to those of its max-sliced counterpart. Moreover, we show that a better sample complexity for a metric does not necessarily translate into better performance when the metric is used as an input for MDS.

artificial intelligence, machine learning, pattern recognition, (16 more...)

2606.29665

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

Kocabay, Şuayp Talha, Akkuş, Talha Rüzgar, Yalçın, Kerem

Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees

arXiv.org Machine LearningJun-30-2026

Scientific discovery via symbolic regression is often viewed as statistically and computationally intractable because the hypothesis space of expressions grows combinatorially with depth. This paper revisits the statistical side through the lens of PAC learning, focusing on compositional function trees built from a finite vocabulary of smooth operators (e.g., $\{+,\times,\sin,\exp\}$ and affine maps). We prove that the relevant generalization quantity, Rademacher complexity, hence the excess risk, does not necessarily blow up exponentially with the number of distinct symbolic structures, but is controlled by (i) the depth $d$ and (ii) the Lipschitz constants of the base operators along the composed computation graph. Concretely, under mild Lipschitz conditions on operators and bounded affine leaves, a finite-union bound over a vocabulary of size $K=|\mathcal{H}_{\mathrm{base}}|$ together with Maurer-type vector contraction yields $\mathfrak{R}_n(\mathcal{H}_{\mathrm{comp}}^{d}) \leq (Kb\sqrt{2}L)^{d-1}\mathfrak{R}_n(\mathcal{H}_{\mathrm{comp}}^{1})$ with arity bound $b$; corresponding high-probability risk bounds scale as $\mathcal{O}(L^{d}/\sqrt{n})$ when $K,b=O(1)$ and $\mathfrak{R}_n(\mathcal{H}_{\mathrm{comp}}^{1})=O(n^{-1/2})$. We complement the theory with a modular codebase that trains differentiable operator trees (not MLPs) on synthetic "physics-like" targets of controlled depth and shows that the empirical generalization gap correlates positively with the predicted complexity term $(\widehat{L}^{d})/\sqrt{n}$.

artificial intelligence, deep learning, machine learning, (16 more...)

2606.29331

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Liu, Haitong, Sridharan, Deepak Narayanan, Steurer, David, Wiedmer, Manuel

Fast algorithms for learning a Gaussian under halfspace truncation with optimal sample complexity

arXiv.org Machine LearningJun-26-2026

We study the fundamental problem of learning a high-dimensional Gaussian truncated to an unknown halfspace. Lee, Mehrotra and Zampetakis (FOCS'24) recently obtained the first polynomial time algorithm for this problem, but their resulting sample and time complexity bounds are not optimal. Under non-trivial truncation, for any target accuracy $\varepsilon > 0$ and dimension $d$ we give an efficient algorithm that uses $n = \tilde{O}(d^2/\varepsilon^2)$ samples and learns the underlying Gaussian to error $\varepsilon$ in total variation distance. Our algorithm is also fast: its runtime is dominated by the cost of computing the empirical covariance matrix. Both our sample and time complexity are optimal in terms of $d$ and $\varepsilon$ even without truncation: in this regard, we can learn a Gaussian under halfspace truncation for free. The key ingredient behind our result is a novel reinterpretation of the low-degree moments of the truncated Gaussian in terms of a relative truncation parameter. This relative truncation parameter uniquely determines the parameters of the untruncated Gaussian and enables direct parameter recovery. This reinterpretation allows us to circumvent the time intensive projected stochastic gradient descent procedure that is widely used in learning under truncation.

artificial intelligence, claim 3, machine learning, (17 more...)

2606.27298

Country: North America > United States (1.00)

Genre:

Research Report (0.70)
Instructional Material (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Pla, Corentin, Richard, Hugo, Abeille, Marc, Perchet, Vianney

Minimax PAC Bounds for Learning in Exogenous Contextual MDPs

arXiv.org Machine LearningJun-25-2026

We study PAC learning in tabular discounted Markov decision processes with exogenous i.i.d. contexts, with discount factor $γ$, finite state space $\mathcal X$, action space $\mathcal A$, and context space $\mathcal Z$. At each time step, a context is drawn independently from an unknown distribution $μ$ and revealed before the agent acts. This context may affect both rewards and transitions, while remaining uncontrolled by the agent. Depending on the regime, the learner has access either to a sampling oracle for $μ$, to a sampling oracle for the transition kernel conditioned on state-context-action tuples, or to both. Oracles can be accessed before and during policy execution. The sample complexity is measured by a couple $(n,m)$, where $n$ is the number of calls to the sampling oracles before execution and $m$ is the number of calls to the sampling oracles during execution. When rewards and transitions are known and only the context distribution $μ$ is sampled, we give a variance-reduced algorithm that solves policy evaluation (PE), best-value estimation (BVE), and best-policy extraction (BPE) with $\left(\widetilde O\left(1/((1-γ)^3\varepsilon^2)\right), 0 \right) $ sample complexity. The rate is independent of $|\mathcal Z|$ and minimax optimal up to logarithmic factors. As a corollary, we also obtain tight rates in the case of one-step perfect look-ahead, improving upon the existing guarantees. In the fully unknown regime, where both $μ$ and P must be learned, we show that PE remains $|\mathcal Z|$-free, with matching upper and lower bounds $\bigl(\widetilde O(|\mathcal X|/((1-γ)^3\varepsilon^2)),\, \widetilde O(1/((1-γ)^2\varepsilon^2))\bigr)$.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2606.2517

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsJun-23-2026, 11:52:40 GMT

Hadamard Test is Sufficient for Efficient Quantum Gradient Estimation with Lie Algebraic Symmetries

Gradient estimation is a central challenge in training parameterized quantum circuits (PQCs) for hybrid quantum-classical optimization and learning problems. This difficulty arises from several factors, including the exponential dimensionality of the Hilbert spaces and the information loss in quantum measurements. Existing estimators, such as finite difference and the parameter shift rule, often fail to adequately address these challenges for certain classes of PQCs. In this work, we propose a novel gradient estimation framework that leverages the underlying Lie algebraic structure of PQCs, combined with the Hadamard test. By analyzing the differential of the matrix exponential in Lie algebras, we derive an expression for the gradient as a linear combination of expectation values obtained via Hadamard tests. The coefficients in this decomposition depend solely on the circuit's parameterization and can be computed efficiently. Furthermore, these expectation values can be estimated using state-of-the-art shadow tomography techniques. Our approach enables efficient gradient estimation, requiring a number of measurement shots that scales logarithmically with the number of parameters, and with polynomial classical and quantum time. This is an exponential reduction in the measurement cost and a polynomial speed-up in time compared to existing works.

artificial intelligence, hadamard test, machine learning, (17 more...)

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsJun-23-2026, 04:19:34 GMT

Efficient PACLearning for Realizable-Statistic Models via Convex Surrogates

A central question in the theory of machine learning concerns the identification of classes of data distributions for which one can provide computationally efficient learning algorithms with provable statistical learning guarantees. Indeed, in the context of probably approximately correct (PAC) learning, there has been much interest in exploring intermediate PAC learning models that, unlike the realizable PAC learning setting, allow for some stochasticity in the labels, and unlike the fully agnostic PAC learning setting, also admit computationally efficient learning algorithms with finite sample complexity bounds. Some examples of such models include random classification noise (RCN), probabilistic concepts, Massart noise, and generalized linear models (GLMs); in general, most of this work has focused on binary classification problems. In this paper, we study what we call realizablestatistic models (RSMs), wherein we allow stochastic labels but assume that some vector-valued statistic of the conditional label distribution comes from some known function class. RSMs are a flexible class of models that interpolate between the realizable and fully agnostic settings, and that also recover several previously studied models as special cases.

algorithm, artificial intelligence, machine learning, (15 more...)

Country: Europe > United Kingdom (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Neural Information Processing SystemsJun-23-2026, 03:42:21 GMT

AFinite Sample Analysis of Distributional TD Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy π. Previous works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return from streaming data is no more difficult than learning its expectation (value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation and employ the exponential stability arguments for products of random matrices. Our results provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Neural Information Processing SystemsJun-23-2026, 02:30:05 GMT

Sample-Adaptivity Tradeoff in On-Demand Sampling

We study the tradeoff between sample complexity and round complexity in ondemand sampling, where the learning algorithm adaptively samples from k distributions over a limited number of rounds. In the realizable setting of MultiDistribution Learning (MDL), we show that the optimal sample complexity of an r-round algorithm scales approximately as dkΘ(1/r)/ε. For the general agnostic case, we present an algorithm that achieves near-optimal sample complexity of eO((d + k)/ε2) within eO( k) rounds. Of independent interest, we introduce a new framework, Optimization via On-Demand Sampling (OODS), which abstracts the sample-adaptivity tradeoff and captures most existing MDL algorithms. We establish nearly tight bounds on the round complexity in the OODS setting. The upper bounds directly yield the eO( k)-round algorithm for agnostic MDL, while the lower bounds imply that achieving sub-polynomial round complexity would require fundamentally new techniques that bypass the inherent hardness of OODS.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

Europe (1.00)
North America > United States > California > Los Angeles County (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)